[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

fktkrt · 2020-09-19T21:48:29Z

What this PR does / why we need it:

I had to migrate our stable/prometheus-operator to the new chart while keeping the existing PersistentVolume.
To do this I had to perform the steps provided below. After the procedure I was able to access the old time-series data stored in the PV.

Which issue this PR fixes

fixes [kube-prometheus-stack] Can't find upgrade procedure from stable/prometheus-operator #84

Special notes for your reviewer:

This guide covers a fresh installation of the new chart, so not persisted configurations (e.g. in values.yaml, custom dashboards) can be lost after the process is finished.

Checklist

DCO signed
Chart Version bumped
Title of the PR starts with chart name (e.g. [prometheus-couchdb-exporter])

Signed-off-by: fktkrt <[email protected]>

charts/kube-prometheus-stack/README.md

Signed-off-by: fktkrt <[email protected]>

gkarthiks

lgtm, let wait for one more reviewer.

scottrigby · 2020-09-20T21:20:48Z

Thanks 😊 This is great if a user needs to upgrade the release name or namespace.

But given that users can do a full name override to match their existing release, would you be willing to add a similar scenario for retaining PVs for an in-place upgrade?

Also is it clear to everyone what data is being persisted here, and how exactly to do that? Users can persist the grafana subchart user account and dashboard data as well, right?

scottrigby · 2020-09-20T21:23:04Z

Alternatively, we could merge this as-is, and allow other members of the community to document how to do so for an in-place upgrade. WDYT?

gkarthiks · 2020-09-21T03:36:57Z

@scottrigby I like your second option. Merge this as-is and support the @fktkrt for his contribution and others (or may be @fktkrt would be interested as well) might be interested in contributing the other documentation.

scottrigby · 2020-09-21T04:17:44Z

Before this is merged, let's have someone try these steps to ensure they work properly. I'm not sure when I'll get to that but once that's verified by at least one other person we can merge this.

I created #121 to make sure we don't forget to address scenarios and details I asked about above.

Signed-off-by: fktkrt <[email protected]>

fktkrt · 2020-09-21T10:17:11Z

Just performed it in another cluster, added a missing step and cleaned up the second patch syntax.

fktkrt · 2020-09-29T19:09:02Z

@m42u Please see my recent changes to this PR based on your comments.
I've omitted deleting the endpoint, because that should be taken care of when removing the service itself.

... there's a volumeName under prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec that might help, but this needs validation.

@desaintmartin, there's this option that might solves your comment, any help is welcomed to help validate it as I have limited time at the moment.

flouthoc · 2020-09-30T03:12:01Z

@fktkrt Path requires me to uninstall previous chart release wouldn't this cause any downtimes ?

fktkrt · 2020-09-30T07:37:02Z

@fktkrt Path requires me to uninstall previous chart release wouldn't this cause any downtimes ?

@flouthoc Yes, from to point the old is removed and until the new is installed (1-2m), you won't have metrics.
If you are seeking a zero-downtime way to upgrade, then helm'sfullnameOverride option might be a good fit as others have pointed out, although I don't have any personal experience with it (in terms of PV/PVCs at least).

Do you maybe want to give it a try? :)

flouthoc · 2020-10-01T07:41:14Z

@fktkrt Thanks i guess 1-2m wont do any damage. I am performing a drill on dummy cluster to record delta for downtime, I guess after that we are good to go. Any reason why is this guide not merged to master yet ?

fktkrt · 2020-10-03T19:40:45Z

@fktkrt Thanks i guess 1-2m wont do any damage. I am performing a drill on dummy cluster to record delta for downtime, I guess after that we are good to go. Any reason why is this guide not merged to master yet ?

@scottrigby said that first let's wait for a few more successful migration report then we can merge this.
So @flouthoc, can you report one more for us? :)

charts/kube-prometheus-stack/README.md

shaikatz · 2020-10-07T08:06:15Z

I just did a migration from JSONNET version of kube-prometheus to the helm chart, using those instructions more or less, and it worked perfectly, I think that should get merged :-)

Signed-off-by: fktkrt <[email protected]>

scottrigby · 2020-10-09T19:11:37Z

We have some good user validation of the steps, so let's merge this. It just needs conflicts to be resolved.

scottrigby · 2020-10-09T19:12:52Z

I have resolved the conflict

…k option Signed-off-by: Scott Rigby <[email protected]>

scottrigby · 2020-10-09T19:24:46Z

Fixed markdownlint on your branch @fktkrt

llamahunter · 2020-10-15T19:25:52Z

The prometheus.prometheusSpec.serviceMonitorSelector changes with the new chart from release: prometheus-operator to release: kube-prometheus-stack. If you have a lot of ServiceMonitor objects around, do you then need to go through and manually update all your existing ServiceMonitor objects? Is there any way to make it continue to use the old selector? Looking at the helm chart, it didn't seem possible, as the Release.Name is hard coded into the ServiceMonitor objects created by kube-prometheus-stack.

fktkrt · 2020-10-16T12:45:01Z

I had no issues regarding this with the following setup:

I am using ServiceMonitors and PodMonitors, and labeling those this way:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: foo-podmonitor
spec:
  selector:
    matchLabels:
      app: foo

This is my prometheusSpec:

prometheus:
  prometheusSpec:
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

additionally, for my custom environments I am overriding these with externalLabels like this:

prometheus.prometheusSpec.externalLabels.prometheus=dev-prometheus

My Prometheus instance has default labels:

Labels:       app=prometheus
              controller-revision-hash=prometheus-kube-prometheus-stack-prometheus-677954c66f
              prometheus=kube-prometheus-stack-prometheus
              statefulset.kubernetes.io/pod-name=prometheus-kube-prometheus-stack-prometheus-0

and the new release can continue scraping all my existing Pod/ServiceMonitors created with the legacy chart.

If you have access to a test cluster (or can deploy a playground with k3s or similar), you can easily doublecheck it, just to make sure nothing breaks.

llamahunter · 2020-10-16T16:45:07Z

By turning off the rule/serviceMonitor/podMonitor selectors, doesn't that make it difficult to tell an instance of prometheus to only monitor things that are related to that instance? What if you have multiple prometheuses (promethei?) running in your cluster?

fktkrt · 2020-10-16T18:41:24Z

You can use additional labels to tie apps to specific prometheus instances.
Another idea that just came up is that if you're using additional charts to manage the CRDs, or you can solve this by modifying only your template of your CRD generator chart.

(By the way, you were close with promethei, https://prometheus.io/docs/introduction/faq/#what-is-the-plural-of-prometheus :) )

…tor (prometheus-community#119) * add upgrade path from stable/prometheus-operator Signed-off-by: fktkrt <[email protected]> * bump chart Signed-off-by: fktkrt <[email protected]> * fix lint Signed-off-by: fktkrt <[email protected]> * fix lint Signed-off-by: fktkrt <[email protected]> * generalize object references Signed-off-by: fktkrt <[email protected]> * fix kubectl patch syntax, add missing step Signed-off-by: fktkrt <[email protected]> * bump Chart version Signed-off-by: fktkrt <[email protected]> * add step to remove legacy kubelet service Signed-off-by: fktkrt <[email protected]> * add instructions to specify AZ trough labels Signed-off-by: fktkrt <[email protected]> * remove unnecessary namespace reference, mention CRD provisioning, add missing zone identifier Signed-off-by: fktkrt <[email protected]> * add volumeClaimTemplate example Signed-off-by: fktkrt <[email protected]> * Fix markdownlint. Also change 'bash' to appropriate linguist codeblock option Signed-off-by: Scott Rigby <[email protected]> Co-authored-by: Scott Rigby <[email protected]>

fktkrt requested review from bismarck, gianrubio, gkarthiks, scottrigby, vsliouniaev and Xtigyro as code owners September 19, 2020 21:48

fktkrt force-pushed the add-prometheus-operator-migration-guide branch 2 times, most recently from 7576c5b to d2a10a6 Compare September 19, 2020 22:16

fktkrt requested review from desaintmartin, monotek and rsotnychenko as code owners September 19, 2020 22:16

fktkrt added 4 commits September 20, 2020 00:38

add upgrade path from stable/prometheus-operator

98a2015

Signed-off-by: fktkrt <[email protected]>

bump chart

0c87d89

Signed-off-by: fktkrt <[email protected]>

fix lint

96e95f0

Signed-off-by: fktkrt <[email protected]>

fix lint

de00619

Signed-off-by: fktkrt <[email protected]>

fktkrt force-pushed the add-prometheus-operator-migration-guide branch from d2a10a6 to de00619 Compare September 19, 2020 22:39

gkarthiks changed the title ~~[prometheus-couchdb-exporter] Add upgrade path from stable/prometheus-operator~~ [kube-prometheus-stack] Add upgrade path from stable/prometheus-operator Sep 20, 2020

gkarthiks reviewed Sep 20, 2020

View reviewed changes

charts/kube-prometheus-stack/README.md Show resolved Hide resolved

generalize object references

09cfc5f

Signed-off-by: fktkrt <[email protected]>

fktkrt force-pushed the add-prometheus-operator-migration-guide branch from 409d910 to 09cfc5f Compare September 20, 2020 09:31

fktkrt requested a review from gkarthiks September 20, 2020 09:36

gkarthiks reviewed Sep 20, 2020

View reviewed changes

scottrigby mentioned this pull request Sep 21, 2020

[kube-prometheus-stack] document retaining PVs for an in-place upgrade #121

Closed

fix kubectl patch syntax, add missing step

080e5d6

Signed-off-by: fktkrt <[email protected]>

fktkrt force-pushed the add-prometheus-operator-migration-guide branch from 91eeb1c to 080e5d6 Compare September 21, 2020 10:16

invidian mentioned this pull request Oct 1, 2020

Source Prometheus Operator chart from new place kinvolk/lokomotive#745

Open

gkarthiks mentioned this pull request Oct 6, 2020

[kube-prometheus-stack] Add migration documentation #181

Closed

baurmatt reviewed Oct 6, 2020

View reviewed changes

charts/kube-prometheus-stack/README.md Show resolved Hide resolved

add volumeClaimTemplate example

2ec1163

Signed-off-by: fktkrt <[email protected]>

fktkrt force-pushed the add-prometheus-operator-migration-guide branch 4 times, most recently from 856abac to 2ec1163 Compare October 8, 2020 09:27

Merge branch 'main' into add-prometheus-operator-migration-guide

7dd4680

scottrigby previously approved these changes Oct 9, 2020

View reviewed changes

Fix markdownlint. Also change 'bash' to appropriate linguist codebloc…

ec488cf

…k option Signed-off-by: Scott Rigby <[email protected]>

scottrigby dismissed their stale review via ec488cf October 9, 2020 19:24

Merge branch 'main' into add-prometheus-operator-migration-guide

952cfb6

scottrigby mentioned this pull request Oct 9, 2020

[kube-prometheus-stack] [Help] Persistant Storage #186

Closed

scottrigby approved these changes Oct 9, 2020

View reviewed changes

scottrigby merged commit a72459c into prometheus-community:main Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

fktkrt commented Sep 19, 2020 •

edited

Loading

gkarthiks left a comment

scottrigby commented Sep 20, 2020

scottrigby commented Sep 20, 2020

gkarthiks commented Sep 21, 2020

scottrigby commented Sep 21, 2020 •

edited

Loading

fktkrt commented Sep 21, 2020

fktkrt commented Sep 29, 2020 •

edited

Loading

flouthoc commented Sep 30, 2020

fktkrt commented Sep 30, 2020 •

edited

Loading

flouthoc commented Oct 1, 2020

fktkrt commented Oct 3, 2020

shaikatz commented Oct 7, 2020

scottrigby commented Oct 9, 2020

scottrigby commented Oct 9, 2020

scottrigby commented Oct 9, 2020

llamahunter commented Oct 15, 2020

fktkrt commented Oct 16, 2020

llamahunter commented Oct 16, 2020

fktkrt commented Oct 16, 2020

[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

Conversation

fktkrt commented Sep 19, 2020 • edited Loading

What this PR does / why we need it:

Which issue this PR fixes

Special notes for your reviewer:

Checklist

gkarthiks left a comment

Choose a reason for hiding this comment

scottrigby commented Sep 20, 2020

scottrigby commented Sep 20, 2020

gkarthiks commented Sep 21, 2020

scottrigby commented Sep 21, 2020 • edited Loading

fktkrt commented Sep 21, 2020

fktkrt commented Sep 29, 2020 • edited Loading

flouthoc commented Sep 30, 2020

fktkrt commented Sep 30, 2020 • edited Loading

flouthoc commented Oct 1, 2020

fktkrt commented Oct 3, 2020

shaikatz commented Oct 7, 2020

scottrigby commented Oct 9, 2020

scottrigby commented Oct 9, 2020

scottrigby commented Oct 9, 2020

llamahunter commented Oct 15, 2020

fktkrt commented Oct 16, 2020

llamahunter commented Oct 16, 2020

fktkrt commented Oct 16, 2020

fktkrt commented Sep 19, 2020 •

edited

Loading

scottrigby commented Sep 21, 2020 •

edited

Loading

fktkrt commented Sep 29, 2020 •

edited

Loading

fktkrt commented Sep 30, 2020 •

edited

Loading